Methods

Data was imported using the \data_gathering.RMD script. See that script for details of collection.

Data Shaping

Taking in raw data and adding a parseable timestamp while filtering on the date and client_ids.

Function Definition

Define functions to create posts per day of week graphs, and timeseries of engagement line graphs.

Additional Data Shaping for Engagement

Shape data into vertical data formats.

Matrices plots of Engagement

First plot is aggregated engagement by content type. Second plot, it engagement by type for client(Labatt).

  • As Bud Light and Michelob ULTRA are the to companies with the highest engagement, comparison of

  • Looking at the engagement by content type we see that Labatt is garnering its most significant engagment on Photos, Video, and Links.

  • [ ] TODO: we need to compare posting activity with engagement activity

Summary Plots

Horizontal stacked bar chart for total engagement comparison of all companies

reorder_size <- function(x) {
  factor(x, levels = names(sort(table(x))))
}
p <- summary_stats %>%
  filter(Engagement != "Total.Posts") %>%
  ggplot(., aes(x = Company, y = Number, fill = Engagement)) +
  geom_bar(stat = "identity") +
  xlab("") + ylab("") +
  coord_flip()

plot(p)

Day of Week

Total posts per day of the week.

# without brand ID these are uninformative
for(i in seq_along(df_names)) {
  p <- day_of_week(df_names[i], client_names[i])
  plot(p)
}

p <- ggplot(data = all_companies_ts, aes(x = wday(timestamp, label = TRUE))) +
  geom_bar(aes(fill = ..count..)) +
  theme(legend.position = "none") +
  xlab("Day of the Week") + ylab("Number of Posts") +
  scale_fill_gradient(low = "midnightblue", high = "aquamarine4") + 
  facet_wrap(~from_name, ncol = 4) +
  ggtitle("Daily Posting Activity by brand")
plot(p)

dowDat <- select(all_companies_ts, total_engagement,from_name, timestamp)
dowDat$dow <- wday(dowDat$timestamp, label=TRUE)
dowDat <- aggregate(total_engagement~dow+from_name, data=dowDat, FUN=mean)

p <- ggplot(dowDat, aes(x = dow, y = total_engagement)) +
  geom_bar(stat="identity", aes(fill = total_engagement)) + 
  facet_grid(~from_name) + 
  ggtitle('Engagements Per Day of Week') +
  theme(legend.position = "none") +
  xlab("Day of the Week") + ylab("Number of Engagements") +
  scale_fill_gradient(low = "midnightblue", high = "aquamarine4")
plot(p)

  • [ ] TODO: Create a plot for Post by engagement graphics (scatter plot). To answer the question on days with lots of posts do we get lots of engagment.

  • [ ] TODO: With that data we can ask what posts get the most engagment, we can look at top engagment and bottom engagements posts and what qualities they share or differ by.

Timeseries Engagement

Plots for the timeseries engagement line.

for(i in seq_along(df_names)) {
  p <- timeseries_engagement(client_names_proper[i])
  plot(p)
}

Initial Visualization of engagement over time on a line

Test viz, showed spike in enegagment for Bud Light in august 2016.

all_companies_ts <- all_companies_ts %>%
  filter(from_id %in% client_ids) %>%
  mutate(month = as.Date(cut(all_companies_ts$timestamp, breaks = "month")))


ggplot(all_companies_ts, aes(x = month, y = total_engagement)) +
  geom_line(aes(group = from_name, color = factor(from_name)))

all_companies_ts %>%
  select(from_name, month, total_engagement) %>%
  group_by(from_name,month) %>%
  summarise(totEng = sum(total_engagement)) %>%
  ggplot(., aes(x = month, y = totEng)) +
   geom_point(aes(color = from_name)) +
  geom_smooth(aes(color = from_name), se = FALSE)

all_companies_ts %>%
  select(from_name, month, total_engagement) %>%
  filter(from_name != "Bud Light" ) %>%
  filter(from_name != "Michelob ULTRA") %>%
  group_by(from_name,month) %>%
  summarise(totEng = sum(total_engagement)) %>%
  ggplot(., aes(x = month, y = totEng)) +
   geom_point(aes(color = from_name)) +
  geom_smooth(aes(color = from_name)) +
  ggtitle("Monthly Facebook Engagement w/o Bud & MichULTRA")

  • This is an interesting drop of ~30% over the first 6 months of 2015. The brand has still not recovered from that reduction.
  • What is different about the content during this period?

  • Might be valuable to look back at the entire timeseries for periods of distinct dynamism.

Labatt Wordclouds

Removed filter because labatt does not have significant inflection point whereas previous analysis

labatt$timestamp <- date(labatt$timestamp)

labatt_clean_pre <- str_replace_all(labatt$message, "@\\w+", "")
labatt_clean_pre <- gsub("&amp", "", labatt_clean_pre)
labatt_clean_pre <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", labatt_clean_pre)
labatt_clean_pre <- gsub("@\\w+", "", labatt_clean_pre)
labatt_clean_pre <- gsub("[[:punct:]]", "", labatt_clean_pre)
labatt_clean_pre <- gsub("[[:digit:]]", "", labatt_clean_pre)
labatt_clean_pre <- gsub("http\\w+", "", labatt_clean_pre)
labatt_clean_pre <- gsub("[ \t]{2,}", "", labatt_clean_pre)
labatt_clean_pre <- gsub("^\\s+|\\s+$", "", labatt_clean_pre)

labatt_corpus_pre <- Corpus(VectorSource(labatt_clean_pre))
labatt_corpus_pre <- tm_map(labatt_corpus_pre, removePunctuation)
labatt_corpus_pre <- tm_map(labatt_corpus_pre, content_transformer(tolower))
labatt_corpus_pre <- tm_map(labatt_corpus_pre, removeWords, stopwords("english"))
labatt_corpus_pre <- tm_map(labatt_corpus_pre, removeWords, c("amp", "2yo", "3yo", "4yo"))
labatt_corpus_pre <- tm_map(labatt_corpus_pre, stripWhitespace)

pal <- brewer.pal(9,"YlGnBu")
pal <- pal[-(1:4)]
set.seed(123)

wordcloud(words = labatt_corpus_pre, scale=c(5,0.1), max.words=25, random.order=FALSE, 
          rot.per=0.35, use.r.layout=FALSE, colors=pal)

Point Graphs for Posts

Displays engagement per post to find outliers.

p <- ggplot(all_companies_ts, aes(x = month, y = total_engagement)) +
  geom_point(aes(color = from_name)) +
  xlab("Year") + ylab("Total Engagement") + 
  theme(legend.title=element_blank(), 
        legend.text=element_text(size=12), 
        legend.position=c(0.18, 0.77), 
        legend.background=element_rect(fill=alpha('gray', 0)))
plot(p)

Total Engagement Line

q <- aggregate(all_companies_ts$total_engagement~all_companies_ts$month+
                 all_companies_ts$from_name,
               FUN=sum)

ggplot(q, aes(x = q$`all_companies_ts$month`, y = q$`all_companies_ts$total_engagement`)) +
  geom_line(aes(color=q$`all_companies_ts$from_name`)) +
  ylab("Total Engagement") + xlab("Year") +
  theme(legend.title=element_blank(), 
        legend.text=element_text(size=12), 
        legend.position=c(0.18, 0.77), 
        legend.background=element_rect(fill=alpha('gray', 0)))

Engagement by Company

### molson Content Over Time ###
t <- all_companies_ts %>%
  filter(., from_name == "Molson Canadian")
t <- data.frame(table(t$month, t$type))

t$Var1 <- date(t$Var1)
ggplot(t, aes(x = Var1, y = Freq, group = Var2)) +
  geom_line(aes(color=Var2)) +
  ggtitle('Molson Engagement') +
  xlab("Year") + ylab("Post Frequency") +
  theme(legend.title=element_blank(), 
        legend.text=element_text(size=12), 
        legend.position=c(0.18, 0.77), 
        legend.background=element_rect(fill=alpha('gray', 0)))

#TRISTEN'S GRAPHS!!
#Labatt Content Over Time

### Labatt Content Over Time ###
t <- all_companies_ts %>%
  filter(., from_name == "Labatt USA")
t <- data.frame(table(t$month, t$type))

t$Var1 <- date(t$Var1)
ggplot(t, aes(x = Var1, y = Freq, group = Var2)) +
  geom_line(aes(color=Var2)) +
  ggtitle('Labatt Engagement') +
  xlab("Year") + ylab("Post Frequency") +
  theme(legend.title=element_blank(), 
        legend.text=element_text(size=12), 
        legend.position=c(0.18, 0.77), 
        legend.background=element_rect(fill=alpha('gray', 0)))

#Labatt Content Over Time

#MichelobULTRA Content Over Time ###
t <- all_companies_ts %>%
  filter(., from_name == "Michelob ULTRA")
t <- data.frame(table(t$month, t$type))

t$Var1 <- date(t$Var1)
ggplot(t, aes(x = Var1, y = Freq, group = Var2)) +
  geom_line(aes(color=Var2)) +
  ggtitle('Michelob ULTRA Engagement') +
  xlab("Year") + ylab("Post Frequency") +
  theme(legend.title=element_blank(), 
        legend.text=element_text(size=12), 
        legend.position=c(0.18, 0.77), 
        legend.background=element_rect(fill=alpha('gray', 0)))

#Labatt Content Over Time

#Bud Light Content Over Time ###
t <- all_companies_ts %>%
  filter(., from_name == "Bud Light")
t <- data.frame(table(t$month, t$type))

t$Var1 <- date(t$Var1)
ggplot(t, aes(x = Var1, y = Freq, group = Var2)) +
  geom_line(aes(color=Var2)) +
  ggtitle('Bud Light Engagement') +
  xlab("Year") + ylab("Post Frequency") +
  theme(legend.title=element_blank(), 
        legend.text=element_text(size=12), 
        legend.position=c(0.18, 0.77), 
        legend.background=element_rect(fill=alpha('gray', 0)))

<<<<<<< HEAD

Pulling #hastags

I found an example on Stackoverflow

> tweets <- c("New R job: Statistical and Methodological Consultant at the Center for Open Science http://www.r-users.com/jobs/statistical-methodological-consultant-center-open-science/ … #rstats #jobs","New R job: Research Engineer/Applied Researcher at eBay http://www.r-users.com/jobs/research-engineerapplied-researcher-ebay/ … #rstats #jobs")
> match <- regmatches(tweets,gregexpr("#[[:alnum:]]+",tweets))
> match
[[1]]
[1] "#rstats" "#jobs"  

[[2]]
[1] "#rstats" "#jobs"  
> unlist(match)
[1] "#rstats" "#jobs"   "#rstats" "#jobs"  
LabattUSA_timeline %>% 
  filter()


tweets <- LabattUSA_timeline$text
match <- regmatches(tweets,gregexpr("#[[:alnum:]]+",tweets))

# Convert the list to a corpus
# new_corpus <- as.VCorpus(new_list)  from Stackoverflow (http://stackoverflow.com/questions/34061912/how-transform-a-list-into-a-corpus-in-r)

new_corpus <- as.VCorpus(match)
class(new_corpus)
inspect(new_corpus)

EnsurePackage <- function(x) {
  # EnsurePackage(x) - Installs and loads a package if necessary
  # Args:
  #   x: name of package

  x <- as.character(x)
  if (!require(x, character.only=TRUE)) {
    install.packages(pkgs=x, repos="http://cran.r-project.org")
    require(x, character.only=TRUE)
  }
}

MakeWordCloud <- function(corpus) {
  # Make a word cloud
  #
  # Args:
  #   textVec: a text vector
  #
  # Returns:
  #   A word cloud created from the text vector
  
  EnsurePackage("tm")
  EnsurePackage("wordcloud")
  EnsurePackage("RColorBrewer")
  
  corpus <- tm_map(corpus, function(x) {
    removeWords(x, c("via", "rt", "mt"))
  })
  
  ap.tdm <- TermDocumentMatrix(corpus)
  ap.m <- as.matrix(ap.tdm)
  ap.v <- sort(rowSums(ap.m), decreasing=TRUE)
  ap.d <- data.frame(word = names(ap.v), freq=ap.v)
  table(ap.d$freq)
  pal2 <- brewer.pal(8, "Dark2")
  
  wordcloud(ap.d$word, ap.d$freq, 
            scale=c(8, .2), min.freq = 3, 
            max.words = Inf, random.order = FALSE, 
            rot.per = .15, colors = pal2)
}

MakeWordCloud(new_corpus)

Mosaic Plot Experiment

======= ## Kevins Questions ##

load('processed_data/bud_fb.RData')
bud$total_engagement <- rowSums(bud[,9:11])
z <- bud %>%
  arrange(desc(total_engagement))
head(z)
##       from_id from_name
## 1 54876245094 Bud Light
## 2 54876245094 Bud Light
## 3 54876245094 Bud Light
## 4 54876245094 Bud Light
## 5 54876245094 Bud Light
## 6 54876245094 Bud Light
##                                                                      message
## 1                                                      Welcome to your cool.
## 2                                               Now, that's a smart phone...
## 3                                     This isn’t your average birthday cake.
## 4                                                                       <NA>
## 5                                        You’ll never believe what we found…
## 6 Whether you prefer bottles or cans, it's what's on the inside that counts.
##               created_time  type
## 1 2013-09-13T21:00:37+0000 photo
## 2 2012-11-03T19:00:01+0000 photo
## 3 2013-08-10T21:00:35+0000 photo
## 4 2013-11-18T17:30:09+0000 photo
## 5 2012-10-15T20:15:09+0000 photo
## 6 2013-12-30T17:30:15+0000 photo
##                                                                                                   link
## 1 https://www.facebook.com/BudLight/photos/a.104089690094.103626.54876245094/10151779634090095/?type=3
## 2 https://www.facebook.com/BudLight/photos/a.104089690094.103626.54876245094/10151287515990095/?type=3
## 3 https://www.facebook.com/BudLight/photos/a.104089690094.103626.54876245094/10151719654105095/?type=3
## 4 https://www.facebook.com/BudLight/photos/a.104089690094.103626.54876245094/10151899712140095/?type=3
## 5 https://www.facebook.com/BudLight/photos/a.104089690094.103626.54876245094/10151263636995095/?type=3
## 6 https://www.facebook.com/BudLight/photos/a.104089690094.103626.54876245094/10151982542270095/?type=3
##                              id
## 1 54876245094_10151779634155095
## 2 54876245094_10151287772655095
## 3 54876245094_10151719654175095
## 4 54876245094_10151899712210095
## 5 54876245094_10151263637030095
## 6 54876245094_10151982542340095
##                                            story likes_count
## 1    Bud Light with Casey Clifton and 47 others.      578713
## 2      Bud Light with Rowdy Wiles and 36 others.      262744
## 3  Bud Light with Daniela Morales and 39 others.      222208
## 4    Bud Light with Kevin Linhart and 46 others.      225471
## 5  Bud Light with Jeffery Devskie and 40 others.      224766
## 6 Bud Light with Shannon Marshall and 40 others.      220762
##   comments_count shares_count total_engagement
## 1           8149       173568           760430
## 2           3772        21279           287795
## 3           4570        46476           273254
## 4           7170        35675           268316
## 5           6668        25496           256930
## 6           7785        16615           245162